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We consider partly linear transformation models applied to cur- 
rent status data. The unknown quantities are the transformation 
function, a linear regression parameter and a nonparametric regres- 
sion effect. It is shown that the penalized MLE for the regression pa- 
rameter is asymptotically normal and efficient and converges at the 
parametric rate, although the penalized MLE for the transformation 
function and nonparametric regression effect are only n^^^ consistent. 
Inference for the regression parameter based on a block jackknife is 
investigated. We also study computational issues and demonstrate 
the proposed methodology with a simulation study. The transfor- 
mation models and partly linear regression terms, coupled with new 
estimation and inference techniques, provide flexible alternatives to 
the Cox model for current status data analysis. 

1. Introduction. Partly linear transformation models are flexible semi- 
parametric regression models where a continuous outcome U, conditional on 
covariates Z G M'^ and G M, is modeled as 

(1.1) H{U) = i:3'Z + h{W) + e, 

where H is an unknown nondecreasing transformation, h is an unknown 
smooth function, and e has a known distribution F with support R. The 
setting we focus on in this paper is when U is not observed directly, but 
only its current status is observed at a random censoring point y G M. More 
specifically, we observe X = {V,A,Z,W), where A = l(^<y). We assume 
that U and V are independent given {Z,W). Although it is not difficult to 
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extend our approach to allow multivariate h, we restrict our attention to 
univariate h for simplicity of exposition. 

Linear transformation models, having the form (1.1) without the nonpar a- 
metric regression term h, have a long history. A parametric version of this 
model, with H specified up to a finite-dimensional parameter vector, was 
initially investigated by Box and Cox [8]. Dabrowska and Doksum [13] stud- 
ied the generalized odds-rate model for the two-sample problem, a special 
case of transformation models with nonparametric H. Cheng, Wei and Ying 
[10] proposed a class of estimating functions for the regression parameter 
/3, with possibly right censored observations, and verified that the resulting 
estimator is asymptotically normal. Bickel and Ritov [7] developed efficient 
methods for linear transformation models with covariates in the uncensored 
setting. 

The model (1.1) can be readily applied to a failure time T by letting 
U = log T. The partly linear Cox model, for example, is obtained by chang- 
ing the sign of /? and h, letting F{s) = 1 — exp{— e*}, and by taking H{u) = 
log^(e'*), where A is an unspecified integrated baseline hazard. More specif- 
ically, the hazard function of T, given the covariates Z = z and W = w, is 
assumed to have the form 

(1.2) X{t\z, w) = exp{/3'z + h{w)}a{t), 

where 

P = -P, h = -h, 

and where a is the baseline hazard. This model has been studied for right 
censored data by Huang [23]. The partly linear proportional odds survival 
regression model has the same form only with F{s) = e*/(l -|- e**). If y is a 
current status time, or "case 1" interval censoring time, then letting V = 
logy will yield the data structure described in the first paragraph. 

Statistical methodology for current status data also has a long history. An 
important early example of current status data comes from tumor studies 
in animals, where the time of tumor onset is of interest, but not directly 
observable, as discussed in [15]. Current status data may occur due to study 
design or measurement limitations. Examples of such data arise in several 
fields, including demography, epidemiology, econometrics and bioassay. Re- 
search into statistical methods for current status data appears to have begun 
with the paper by Ayer, Brunk, Ewing, Reid and Silverman [2] on estimat- 
ing a distribution function from a single sample. Other early approaches to 
current status data analysis include the use of generalized linear regression 
models [3, 14, 24, 50]. Andrews, van der Laan and Robins [1] investigate lo- 
cally efficient estimators of parametric regression models with current status 
data. 
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Over the past decade, the fascinating asymptotic properties of estima- 
tors in the single sample case [19], and the similarity to more general kinds 
of interval censoring, have kindled significant interest in nonparametric ap- 
proaches to estimation in current status regression [4, 16, 17, 20, 22, 28, 32, 
34, 35, 36, 38, 40, 41, 42, 44, 45, 46, 52]. 

Our goal in this paper is to study estimation in the model (1.1), with 
special attention on inference for [3. Our results extend the nonparametric 
likelihood-based approach of Huang [22, 23] in two ways. First, a smooth 
nonparametric covariate effect is added to the Cox model for current status 
data. Second, results are obtained for general transformation models with 
arbitrary but known residual distribution F. The approach we take is to use 
nonparametric maximum penalized log-likelihood. 

Several interesting issues arise from carrying through this extension. First, 
the convergence rates for the estimators of h and H , hn and Hn, appear to 
interfere with each other so as to require oversmoothing of hn and thus 
force the convergence rates to both equal Op(n~^/^). Second, Hn has a bias 
which does not vanish asymptotically, even when no regression terms are 
present. This bias arises from an assumption we make that the support 
of the current status value y is a finite interval. This assumption is also 
made by Huang [22]. In spite of this persistent bias, Hn is L2 consistent and 
bounded in probability, and thus is sufficiently consistent to enable weak 
convergence of the estimator of /?, Third, inference for (3 is challenging 
because estimation of the covariance directly is impractical and there exists 
no generally applicable method of inference for penalized estimators of partly 
linear models. The likelihood ratio expansion results of Murphy and van der 
Vaart [31] cannot be used in our setting since the penalized component of the 
objective function is larger than Op{n~^) and thus not negligible in the limit. 
To resolve the inference problem, we use a block jackknife estimator which 
is computationally simple and which applies, in general, to asymptotically 
linear statistics. 

There is an interesting connection between the model (1.1) and semipara- 
metric binary choice models studied in econometrics [9, 11, 12, 21, 26]. The 
model (1.1) can also be expressed as the probability of a consumer choosing 
"A = 1" instead of "A = 0," given covariates {Z,W,V), via the expression 

(1.3) P[A = 1\Z, W, V] = F{f3'Z + h{W) + H{V)), 

where H is assumed to be a monotone covariate effect, is a known function, 
and the other covariate effects are as defined previously. Without the H 
term, (1.3) is precisely the model studied by Hardle, Mammen and Miiller 
[21]. This connection has also been observed in other settings of current 
status data study, for example, [1, 41, 42, 46]. 

The next section, Section 2, presents the data and model assumptions, 
along with several examples of residual distributions F which satisfy the 
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given requirements. The maximum penalized log-likelihood estimation pro- 
cedure is presented in Section 3. Results on consistency of the estimators and 
the persistent bias in are given in Section 4. Section 5 presents results on 
rates of convergence for parameter estimators, and Section 6 presents asymp- 
totic normality and efficiency of [in- A block jackknife inference procedure 
for (3 is presented in Section 7. Several computational issues are discussed 
in Section 8, and a simulation study evaluating the moderate sample size 
performance of the proposed methods is given in Section 9. Proofs are given 
in Section 10. 

2. The data setup and model assumptions. 

2.1. Data and model assumptions. The data {Xi = (Vi, Ai, Zi,Wi),i = 
l,...,n} consists of n i.i.d. realizations of X = {V,A,Z,W), as described 
in the Introduction, generated by the model (1.1). Recall that A = l([;<y), 
where [/ is a real- valued outcome of interest. We make the following addi- 
tional assumptions about the covariates and censoring distribution. 

Al. (a) The covariate Z belongs to a bounded subset Z C M'^. (b) The 
support for W is [a,b], where —oo < a < b < oo. (c) The support for V 
is an interval \ly , Uv], where — oo < k, <Uy <oo. 

A2. Evai[Z\V,W] is positive definite. 

A3. With probability > 0, the conditional distribution of W given V domi- 
nates the unconditional distribution of W. 
A4. U and V are independent given {Z, W) . 

Define the function class Q,^ = {h: [a, b] i— > Mwith J(/i) < oo}, where J'^{h) = 
la dw for some positive integer v, and where h^^^ is the jth deriva- 

tive of h. Also define the following subset of : = {h:h £ 
sup^ < candE[h(W)] = 0}, where c is a positive constant. When c = 

oo, the superscript is omitted. We make the following additional model as- 
sumptions. 

Bl. The distribution of U given the covariates has the transformation model 

form given in (1.1). 
B2. The true regression parameter /3o belongs to a known, bounded open 

subset Bo of W^. 

B3. The true nonparametric covariate effect ho E Q'^^q, for some known cq < 
oo. 

B4. The true transformation Ho : M i-^' K is strictly monotone increasing and 
bounded on compacts. 

B5. (a) The residual error distribution F is known, (b) F has first and sec- 
ond derivatives / and /, respectively, where the support of / is M and 
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where |/| is bounded, (c) For each compact C M, there exist con- 
stants c E (0, oo) and a £ (1/2, 1] and an increasing isomorphic function 
^ : [0, 1] ^ [0, 1] so that 

sup sup + + <ce", 

for all e G (0, 1). (d) [fiv) - fiv)Fiv)] A [fiv) + /(«)(! - Fiv))] > 0, 
for all u G M. 

Remark 1. In condition B3 it is assumed that the unknown nonpara- 
metric covariate effect ho is bounded by cq. In the theoretical proofs and 
numerical calculations the exact value of cq is not necessary. Instead, only 
the boundedness condition is needed. The condition B5(c) is used to control 
the entropy of the model in order to obtain consistency. Condition B5(d) 
ensures convexity of the function s i— > 5 log F{s) + (1 — (5)log(l — F(s)) for 
6 = 0,1. This convexity is used in the proof of Theorem 2 below to estab- 
lish that the estimator of the transformation function is bounded above and 
below in probability. 

Remark 2. It is not hard to verify that if u -^(^) satisfies B5(b)- 
B5(d), then for any a G (0, oo) and b £M, u>-^ F[au + h) also satisfies B5(b)- 
B5(d). 

2.2. Examples of transformation models. The following are several ex- 
amples of residual error distribution functions. 

1. F{u) = 1 — exp[— e"] is the extreme value distribution and corresponds to 
the complementary log-log transformation. 

2. F{u) = e"[l + e"]~^ is the logistic distribution and corresponds to the 
logit transformation. 

3. = l-[l+7e"]-^/Tisa Pareto distribution with parameter 7 G [0, 00) 
and corresponds to the odds-rate transformation family. Taking the limit 
as 7 I yields the extreme value distribution, while 7=1 gives the logistic 
distribution. 

4. F{u) = $(u) = (27r)~-'^/^ /"g^ exp[— s^/2] ds is the standard normal distri- 
bution which corresponds to the probit link. 

5. F{u) = 7[2r(l/7)]~-^ exp[— |s|'''] ds, for 7 G [1, 00), is a family of distri- 
butions which, after appropriate rescaling as justified in Remark 2 above, 
includes (corresponding to 7 = 2). 

6. F{u) = 1/2 + tt""*^ tan~"'^(n) is the Cauchy distribution. 

The following lemma gives a few examples which satisfy B5. 

Lemma 1. Examples 1-4 and example 5 with 7G (1,00) satisfy condi- 
tions B5(b)-B5(d). 
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Remark 3. Actually, many residual error distributions satisfy B5, but 
example 5 with 7 = 1 and example 6 do not satisfy condition B5(d) [although 
they do satisfy B5(b) and B5(c)]. Consider first the Cauchy example. To see 
that B5(d) is not satisfied, note that, for sufficiently large u, 1 — F{u) = 
[nu]'^ + o{l / u). Since /(n) = [7r(l + n^)]~^ and /(u) = — 27rti/^(n), we have 
for « > that f{u){l - F{u)) + f{u) = - fiu) (1 + 0(1 /u)) and, thus, B5(d) 
is not satisfied. In example 5 with 7 = 1, we have that f(u)(l — F(u)) + 
f(u) = for all It > and, thus, B5(d) is again not satisfied. 

Condition B5(c) can sometimes be assessed directly for a given error dis- 
tribution. In the proof of Lemma 1 this approach is taken to verify B5(c) 
for examples 1-3. In other settings the condition is hard to use directly, 
and the sufficient conditions given in the following lemma are more easily 
established. This approach is used to establish B5(c) for examples 4 and 5 
[or 7 G (1,00)]. 

Lemma 2. Let F satisfy condition B5(b) and: 

(i) For every r G [0,00), there exists a Ci '^ G [0,00) such that f(—u) < 
< f(u) and 

f(-u)f(-u + r) - f(-u)f(-u + r) < < f(u)f(u - r) - f(u)f(u - r), 

for all u G (c^^^ , 00) . 

(ii) For every r G [0,oo), there exist a c^2^ G (0, cxo), an Or G (1/2, 1] and 
an increasing isomorphic function ^i'^^ : [0, 1] ^ [0, 1] such that 

F(F-\ci"\e})+T) V [1 - F(F~\^i^\l - e}) - r)] < 4"^e"^ 
for all € sufficiently small. 
Then F satisfies B5(c). 

3. Maximum penalized log-likelihood estimation. Under model (11) 
the log-likelihood for a single observation at X = x = (v,S,z,w) for the pa- 
rameter choice (P,h,H) is 

l(x;p,h,H) = 6log{F[f3'z + h(w) +H(v)]} 

(3.1) 

+ (1-6) log{l - F[f3'z + h(w) + H(v)]}. 

Intuitively, we will need some mechanism to control the smoothness of 
estimates of Hq. One approach is to use sieve estimates with assumptions 
on the derivatives, as in [23]. However, we use instead a penalized approach 
based on splines. An advantage is that the degree of smoothness is controlled 
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by a single number, the penalty term. Specifically, we use the following 
penalized log- likelihood function based on n observations: 

IP{P, h, H) = P„/(x; /3, h, H) - Xlj\h), 

where P„ is the empirical measure based on the n observations Xi , . . . , X„ 
and Xn is the (possibly data-dependent) smoothing parameter. We maximize 
IP under the constraints that P £ Bq, hG^u with \h\ < cq and Fnh{W) = 0, 
and that H is nondecreasing, where B denotes the closure of the set B. 
We also define ln{P,h,H) =¥nl{x]P,h,H) to be the ordinary log-likelihood 
function. The maximum penalized log- likelihood estimators /3„, hn and Hn 
are the ones that maximize the penalized log-likelihood function under the 
stated constraints. We make the following assumption about the penalty 
term A„: 

C. A„ = Op(n-V3) ^-1 ^ Op(nV3). 

Remark 4. The tuning parameter is usually selected through certain 
cross validation techniques (see [49] for reference). However, for the asymp- 
totics to hold, we only require A„ to be of the correct order. One way of 
achieving this is to simply set A„ = n~^/^, as we do for the simulation studies 
shown in Section 9. In addition, while our theoretical results require speci- 
fication of Co, it appears from our experience that cq does not actually need 
to be specified for implementation with moderate sample sizes. Numerical 
studies show for finite sample sizes that this strategy usually yields satisfac- 
tory results. Note that we are oversmoothing in our choice of penalty term. 
This is a consequence of interference from estimation of H. In Theorem 4 
presented in Section 5, and in the proof of the theorem given in Section 10, 
it is demonstrated that the entropy for the model is driven by the entropy 
of the class of monotone functions which parameterizes H, resulting in an 
overall rate of Op(n~^/^). Although we can refine the rate for estimation of 
P to Op(n-V2)^ 

it is unclear how to improve the rate of Op{n ^^'^) for esti- 
mating h when > 1. It is an open question whether achieving the optimal 
rate for h in model (1.1) is even possible using penalized log- likelihood. 

Remark 5. Let y(i), ■ • • ,y(n) be the order statistics of Let 
, W(^i) and correspond to . Since only the values of H at matter 
in the log-likelihood function, we will take the maximum likelihood estimator 
Hn as the right-continuous nondecreasing step functions with jump points 
at y^i). 

Remark 6. From time to time, it will be convenient to assume that 
(5(1) = 1 and = 0. Such an assumption usually has little impact on results. 
To see this, note that if = 0, then this observation makes a contribution 



8 



S. MA AND M. R. KOSOROK 



of zero to the log-hkehhood function after maximizing over H. Similarly, if 
= 1, then the corresponding observation also makes no contribution to 
the log-likelihood. We will make our use of this assumption clear. Further 
discussion about this assumption can be found in Section 3.1 of [19]. 

4. Consistency. The following lemma establishes existence of the maxi- 
mum penalized log-likelihood estimators. 

Lemma 3. Under the assumptions A1-A4, B1-B5, the assumption of 
Remark Q, and provided < A„ < oo, a maximum penalized log-likelihood es- 
timator Ipn = {Pn,hn,Hn) exists, with (3n S Bq, /l„ G ^y, SUPsg[„ ,,] \hn{s)\ < 

Co, FnhniW) = and -oo < i?„(y(i)) < Hn{y{n)) < oo. 

Remark 7. Provided (5(j) = 1 and = for some j G {1, . . . , n — 1}, 

Lemma 3 implies that — oo < Hn{y(k)) ^ Hn{y(i)) < oo, where k = inf{j G 
{1, . . . , n} : = 1} and / = sup{j G {1, . . . , n} : = 0}. Note that the log- 
likelihood contributions for the observations corresponding to (5^) for j ^ 
{k, . . . ,1} are zero. 

Define the following distance between parameters (hi, Hi) and {h2,H2): 
dF{{hi, Hi), {h2, H2)) = \\hi - /12II2 + ll-f^i - -f^2||F,2, where \\hi - h2\\2 = 
X\hiiw) - h2iw)\'dw]'/' and \\Hi - H2\\f,2 ^ 1^ \FiHiiv)) - 
F{H2{v))\'^dv]^/'^. Note that since F has a bounded derivative, II • II f 2 is 
somewhat weaker than the usual L2 norm. We use || • || to denote Euclidean 
distance. The following is the main consistency result. 

Theorem 1. Under the assumptions A1-A4, B1-B5 and C, dp{{hn, 

Hn), {ho, Ho)) + \\(3n- Poll =0pil). 

Remark 8. We will show in Section 5 that J{hn) = Op{l) and, thus. 
Theorem 1 combined with condition Al(b) will imply sup^gj^ ;,] \ hn{w) — 

ho{w)\ = Op(l), since the h^s are smooth functions defined on a compact 
set with asymptotically bounded first-order derivatives. 

Under the provision given in Remark 7, and with k and I as given in that 
remark, define 

{Hn{y(k)), t(^[lv,y{k)), 
Hn{t), [y(fc),y(o]> 
Hn{y{i)), otherwise. 
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and let Hn = if the provision is not met. Theorem 1 clearly implies that 
\\Hn — Hq\\p^2 = Op(l); and although L2-convergence does not imply uniform 
convergence, the following theorem ensures that \Hn\ is bounded. 

Theorem 2. Under the conditions of Theorem 1, we have \Hn{lv)\ V 

|^nK)|=Op(l). 

Remark 9. Theorems 1 and 2 jointly imply that \Hn{v) - 
Hq{v)\^ dv = Op(l). It appears that this cannot be strengthened to uniform 
consistency. In the following theorem, we prove in the setting where covari- 
ate effects are not included in the model that uniform consistency of Hn is 
impossible under conditions Al(c) and B4 and when the distribution of V 
is continuous. 

Theorem 3. Assume the nonparametric current status model [model 
(1.1) without f3 or h] holds, and let Gq denote the unknown true distribu- 
tion ofU. Assume condition Al(c) holds for the current status time and that 
Gq is continuous with < GQ{ly) < G(){u^) < 1 {essentially condition B4). 
Assume also that the distribution of V is continuous. Let Gn be the non- 
parametric maximum likelihood estimator of Gq {no penalization is involved 
since h is not in the model), and assume that the condition in Remark 6 
holds. Then: 

(i) /or any e G (0,Go(Z„)), liminf„^oo-P{G„(V(i)) <Go(^„)-e}>0, and 

(ii) for any e G (0,{1 - Go{uy)}), liminf„-^oo i^{G„(V(„)) > Go{uv) + 
e} >0. 

Remark 10. Through simulation studies involving up to 20,000 obser- 
vations, we have verified that the bias predicted in Theorem 3 does persist 
as n gets large. This does not contradict the uniform consistency result in 
Section II. 4.1 of [19]. In their consistency proof, they require the probability 
measure for the current status value V to dominate the distribution of U. 
Thus, the total variation of G„ over the support of V is bounded by 1, and 
the total variation of Gq over the support of V is equal to 1. Thus, in this 
setting, L2-convergence of Gn with respect to the measure Go will indeed 
imply uniform convergence. 

Remark 11. By expression (10.5) in the proof of Theorem 1, and by 
the results of Theorems 1 and 2, it is clear that if 

(4.1) P{V = QaP{V = u,)>0, 

then Hn is uniformly consistent for Hq ov6r the interval [/^, Ut,]. Unfortu- 
nately, the assumption (4.1) does not seem to be very realistic. Fortunately, 
for the results that follow, L2 convergence of Hn is sufficient. 
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5. Rates of convergence. For this section we will use the usual L2 dis- 
tance between parameters {hi, Hi) and {h2,H2): -ffi), (/12, -^2)) = ll^i — 
/12II2 + 11^1 -^2||2, where \\Hi- H2\\2 ^ [Jl'j \Hi{v) - H2iv)\^ dv]^/^ .We now 
establish the rate of convergence for all parameters. 

Theorem 4. Rate of convergence: Suppose that assumptions A1-A5, 
B1-B4 and C are satisfied. Then J{hn) = Op{l) and \\j3n — f3o\\ + d{{hn,Hn), 
{ho,Ho)) = Opin~y'^). 

Remark 12. In ordinary spline settings, A„ = Op{n~^/^'^^~^^^) and the 
optimal rate of convergence for a smooth function of the covariate w is 
Op{Xn) (see, e.g., [49]). Typically, A„ is a data driven smoothing parameter 
selected by cross validation, GCV, Lp or Lc, as discussed in [27]. However, 
in our case the rate of convergence for Hn cannot exceed n^^^, which slows 
down the convergence of hn- In particular, the rate for /i„ does not achieve 
the optimal convergence rate of Op{Xn) when v > 1, as is achieved in [30]. 
It is also worth pointing out that it appears we cannot achieve a better 
convergence rate by modifying the smoothness assumptions. 

Remark 13. As discussed in Section 1, a special case of the general 
transformation models is the proportional hazard model. It is shown by 
Groeneboom and Wellner [19] that the convergence rate of the NPMLE of a 
distribution function is n^^^ for current status data. So we have shown that, 
under reasonable model assumptions, the convergence rate of Hn achieves 
the optimal rate, despite the presence of an additional infinite-dimensional 
parameter h. 

In some cases it is reasonable to assume that the transformation function 
is also continuously differentiable. In [29] penalized estimation of the trans- 
formation function is investigated under certain smoothness assumptions. A 
sharper convergence rate can be achieved if we assume the transformation 
function belongs to a certain Sobolev space. In our case, if we assume the 
transformation function Hq and the nonparametric effect Hq belong to the 
same Sobolev space, then we can achieve optimal convergence rates for both 
parameters by using doubly penalized estimators. However, if it is assumed 
that h and H belong to different Sobolev spaces, then it is unclear whether 
we can achieve the optimal convergence rate for both estimators. 
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6. Weak convergence of the parametric covariate effect. 

6.1. Information calculation. It is well known that in most parametric 
models we can estimate the finite-dimensional parameter at the n^/^ conver- 
gence rate. However, this is not necessarily true for semiparametric models. 
A necessary condition is that we have positive information, which is not a 
trivial condition. Next we show that, for our model, indeed, we have positive 
information. The following extra model assumptions will be needed: 



Dl. 3hG Q^fi such that 

E{h{W)Ql^{X)\V) 



E!^[Z-hiW)] X 



h{W) 



for every h G Q^^q , where 

= {13, h, H),i;o = {(3o,ho, Ho),e^{x) = P'z + h{w) + H{v) and 
E{ZQl^{X)\V = v) ECh{W)Ql^{X)\V = v) 



q{v) 



E{Ql^{X)\V = v) E{Ql^{X)\V 



has a derivative which is uniformly bounded on [/„,u^]. 
D2. Jo = E{ii') is positive definite, where 1 = {Z - h{W) - q{V)}Q^^{X). 



Before giving the main result of this section, Theorem 5, we present a 
lemma which provides sufficient conditions for achieving Dl and determin- 
ing h. Let Do{v,w) = E[Ql^{X)\V = v,W = w],Di{v,w) = E[ZQl^{X)\V = 
v,W = wiDoiiv) = E[Ql^{X)\V = v] and D02H = E[Ql^{X)\W = w]; and 
define Sq to be the class of functions g : [^t), n„] x [a, 6] h-> M with E[g'^(V, W) x 
1)0(^)1^)] < 00. For any g £ Sq, let IIi be the projection operator 

E[g{V,W)Ql{X)\V = v] 
E[Ql^iX)\V = v] 

and let 112 be the projection operator 

'4>o 



E[giV,W)Ql(X)\W = w] 



^' ' E[Ql{X)\W = w] ■ 

Also define D* = Di/Dq,DI = IiiD*,D^ = U2D*,R{v,w) = Do{v,w)/ 
Doi{v),S{v,w) = Do{v,w)/Do2{w),Dtiv) = {d/{dv))Dl{v),R^{v,w) = {8/ 

{dv))R{v,w),D*^''\w) = {d/{dw)YDl{w) imd s''^\v,w) = {d/{dw)Y S{v,w) 
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Lemma 4. Assume the model conditions of Section 2.1. Also assume 
that V and W are independent with the density of W being bounded above 
and below on [a,b], that Dl{v) and E[Ri{v,W)]'^ are uniformly bounded on 
[lv,u^], and that both E[D*^''\w)]'^ < oo and E[si''\v,W)]'^ < oo. Then Dl 
is satisfied with h = h*-Eh*iW), where h* = limm^oc[T,JLo0^2^iy]n2{l- 
Ui)D*. 

Remark 14. We note that while the conditions of Lemma 4 are some- 
what stronger than our previous model assumptions, the conditions are still 
reasonable. The most arduous of the new assumptions involve bounding the 
derivatives of several well-defined functions. Note that the denominators of 
the ratios that define these functions are Dqi{v) and Do2(w^)- Since Dqi 
and Dq2 are bounded below on [/t,,M„] and [a,b], respectively, the task of 
bounding the necessary derivatives is simplified somewhat. 

Theorems. Calculation of efficient information: Under model assump- 
tions A1-A4, B1-B5 and D1-D2, Iq is the efficient information matrix for 
(3. 

Remark 15. Knowledge about the degree of smoothness for h cannot 
currently be utilized to improve the rate of convergence for /i„ or the asymp- 
totic precision of Pn- This is a consequence of the fact that the Sobolev space 
Q^i/^ is dense in "Qj^^ when i^i > z/2. However, it is unclear whether such knowl- 
edge can result in small sample improvements in the accuracy of 

6.2. Asymptotic normality and efficiency. 

Theorem 6. Asymptotic normality and efficiency: Assume conditions 
A1-A4, B1~B5, C and D1-D2. Also assume that the inverse of Hq, Hq^ , 
has a derivative bounded on compacts. Then n-^/^(/3„ — /3o) = Iq^ y/n¥nl + 
0,(1)4 NiO,I^'). 

Since /3„ is asymptotically linear with efficient infiuence function, and 
the model is sufficiently smooth (Hellinger differentiable) , it is asymptoti- 
cally efficient in the sense that any regular estimator has asymptotic covari- 
ance matrix no less than that of /?„ [6]. The additional assumption on the 
smoothness of Hq^ is needed to construct an approximately least-favorable 
submodel (see Section 25.11 of [47]) under which the given estimator satisfies 
the efficient score equation. 

Another important issue for statistical modeling with current status data 
is the degree of robustness achieved under model misspecification. Yu and 
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van der Laan [53] investigate doubly robust estimation in longitudinal mar- 
ginal structural models. Their results allow one to construct locally efficient 
estimators of the regression parameters, under the misspecification of either 
the unknown regression function or the conditional distribution of the linear 
variables. The question of whether double robustness holds in our setting is 
also of interest, but is beyond the scope of the current paper. 

7. The block jackknife. One potential method of inference for (5 is to 
use the nonparametric bootstrap. Unfortunately, there is no sufficiently gen- 
eral theory, as far as we are aware, available for the nonparametric boot- 
strap in the penalized maximum likelihood estimation setting. Alternative 
approaches are the m within n bootstrap (see [5]) or subsampling (see 
[33]). Since \/n{Pn — Po) has a continuous limiting distribution C, Theo- 
rem 2.1 of [33] yields that the m out of n subsampling bootstrap converges — 
conditionally on the data — to the same distribution C, provided m/n — > 
and m — > oo as n — > CO. Because of the requirement that m — > oo as n — > oo, 
the subsampling bootstrap potentially involves many calculations of the es- 
timator. Fortunately, the asymptotic linearity given in Theorem 6 can be 
used to formulate a computationally simpler method of inference. 

Let (3n be any asymptotically linear estimator of a parameter /?o G M'^, 
based on an i.i.d. sample Xi, . . . , X„, having square- integrable influence func- 
tion (j) for which Elcpcf^] is nonsingular. Let m be a fixed integer > d, and, 
for each n>m, define km,n to be the largest integer satisfying mkm.^n < n. 
Also define Nm,n = fn^m^n ■ For the data Xi,. . . , X„ , compute /3„ based on 
the proposed estimation method and randomly sample iVm,n out of the n ob- 
servations without replacement, to obtain X^, . . . , X"^^ ^ . For j = 1, . . . ,m, 

let /?* j be the estimate of (5 based on the observations X^, . . . , X'^^ ^ after 
omitting X* , . , X*^^^ . Compute 

m m 
P*^^m~'Y.^lj and S*^^im-l)krn.,nY.(f^l^-^nWn,j-Pnf- 

The following lemma provides a method of obtaining asymptotically valid 
confidence ellipses for (3q. 

Lemma 5. Let I3n be an estimator of [3q G , based on an i.i.d. sample 
Xi, . . . ,Xn, which satisfies n^l'^{f3n — Po) = \AiIPn<^ + Op(l), where E[(f)(p'^] is 
nonsingular. r/ienn(/3„ — /3o)"^[<S'*]~^(/3„ — /3o) converges weakly to d{m — 1) x 
Fd,m-d/ — d), where Fj-^s has an F distribution with degrees of freedom 
r and s. 

The key to the proof of Lemma 5 is the simultaneous validity of the 
asymptotic linearity expansion for all of the jackknife estimates. The details 
of the proof are given in the Appendix. 
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Remark 16. Let 5** be S* with f3^ replaced by the estimator of /? based 
on XI, . . . , X]^^^^ ^ (which we denote /?*). Using arguments in the proof of 
Lemma 5, it is straightforward to show that replacing S* with 5** will not 
affect the conclusions of Lemma 5. Those same arguments also lead to the 
conclusion that one cannot, in general, replace /3* with except when 

-^m,n = n (in which case Pn = Pn)- 

Remark 17. The block jackknife procedure only requires computing 
the estimator m + 1 times (or m + 2 times when S** is used as discussed in 
Remark 16). In our simulation studies we have found that for the proposed 
estimator m = 10 works for sample sizes n = 400 and 1600. The fact that m 
remains fixed as n — > oo in the proposed approach results in a potentially sig- 
nificant computational savings over subsampling, which requires m to grow 
increasingly large as n — > oo. A potential challenge for the proposed approach 
is in choosing ni for a given data set. The larger m is, the larger the denom- 
inator degrees of freedom in Fd^m-d stnd the tighter the confidence ellipsoid. 
On the other hand, m cannot be so large that the asymptotic linearity of 
Theorem 6 does not hold simultaneously for all jackknife components. 

8. Computational techniques. 

8.1. Overall strategy. Computationally, finding the penalized MLE for (3, 
h and H is a maximization problem subject to the boundedness constraint 
for h and the nondecreasing constraint for It is unlikely that there exists 
an analytic solution for this model. So we propose the following iterative 
maximization technique. 

51. For a given /3n^ and hl^\ estimate Hn^ by maximizing 1^ with respect 

^ (k) 

to H under the constraint that Hn is a nondecreasing step function. 

52. For a given find (3^n^^^ and hl^^~^^^ that maximize the penalized 
log-likelihood function. 

We have found that almost any reasonable initial values will work. The above 
two-step maximization procedure is repeated until certain convergence cri- 
teria are satisfied. The global convexity of the log-likelihood function guar- 
antees that we can reach the maximum by the above technique. For step 
S2, our experience indicates that, in applications involving moderate sam- 
ple sizes, specification of cq is not needed and A„ = n~^/^ appears to work 
most of the time. Perhaps using cross validation to choose A„ may improve 
the performance of the estimator in some settings, but evaluating this issue 
requires further study and is beyond the scope of the current paper. 

Remark 18. After finite iterations, what we get is not exactly the pe- 
nalized MLE. However, a very nice property of the efficiency theorem is that 
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we only need approximate maximization to achieve asymptotic efficiency for 
/3, tliat is, IPn^fl h H ~ '^p(^~^^^)' where k is the estimating function as 



defined in the proof of Theorem 6. 

8.2. Sieve approximation for the nonparametric covariate effect. For the 
special case of = 2, we can use a cubic spline for estimating h. Suppose 
a function /i* maximizes the penalized log-likelihood function. Then there 
exists a cubic spline function /i„ such that h^{wi) = hn{wi) for i = 1, . . . ,n 
and J{hn) < J{h^). For a proof, see page 18 of [18]. The number of basis 
functions of a cubic spline increases at the rate 0{n). Computationally this 
can be quite time consuming for a moderate or large set of observations. 
Hence, we take a computational sieve approach suggested by Xiang and 
Wahba [51], which states that an estimate with the number of basis functions 
growing at least at the rate n^/^ can achieve the same asymptotic precision 
as the full space. The K-mean clustering technique (see [25] for reference) 
is used to select the proper positions of knots, and i?-spline basis functions 
are utilized. Because of the accuracy of this sieve approximation and the 
fact that the degree of smoothness is still controlled by the penalty term, 
the theoretical properties of the resulting estimators should be unmodified 
from the previously derived theory. 

8.3. Estimation of the transformation function. The maximization over 
the nondecreasing function H can be solved by some commercial software 
package, such as NPSOL. However, we show that the cumulative sum dia- 
gram approach, as discussed by Groeneboom and Wellner [19] and Huang 
[22], works for general transformation models. First we observe the following 
properties of 

Lemma 6. Assume that = 1, (5(,„) = 0. Then for any fixed and 
the maximum likelihood estimator Hn satisfies 



(8.1) 




,^ ^ ^ f0nZ{j) + /^n(W(j)) + Hn{V(j))) 




for i = l 



. , n, and 
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(8.2) 

- (1 - 5.) mH^)+hn{w,))+Hn(y,))) U ^ 0. 

This lemma can be proved in a manner similar to Proposition 1.1 of 
[19], and we omit the details. These properties motivate us to consider the 
following iterative, but computationally efficient algorithm. 

Define the process Wh, Gh and Oh by 

Jv'c^[o,v] F\e^)[i- F{e^)) 

Oh{v) = Wh{v)+ J H{v')dGH, 
where v > and Rn is the unobserved empirical measure of (?7, V, Z, W). 

Lemma 7. Self-induced calculation of H^- Assume that = 1 and 
^(n) = 0- Then for any fixed (3n and hn, the maximum likelihood estimator 
Hn is the left derivative of the greatest convex minorant of the self-induced''^ 
cumulative sum diagram, consisting of the points [G ^ (V(j)),0^ (^(j))) '^'^^ 
the origin (0, 0). 

The proof of Lemma 7 is analogous to that of Proposition 1.4 of [19] 
and is omitted. This lemma gives an iterative procedure for finding as 
discussed in [19]. Suppose Hn^ is the result of the kth iteration; then H^~^^^ 
is computed as the left derivative of the greatest convex minorant of the 
cumulative sum diagram, consisting of the points (G^(fe) (V(j)), O^(fe) (V(j))) 

and the origin (0, 0). 

9. Simulation study. To evaluate the finite-sample performance of our 
estimators, we conduct a small simulation study with current status data 
for the partly linear Cox model. As discussed in Section 1, the Cox model is a 
special case of general transformation models, where H{u) = log(^(e'')) and 
F{s) = 1 — exp(— e*). The event times are generated from equation (1.2), 
with regression coefficients f3i = 0.3 and /?2 = 0.25. The covariate Zi is 
Uniform[0.5, 1.5] and Z2 is Bernoulli with probability of success 0.5. For 
simplicity we take h{u)) = sin(w/1.2 — 1) — ko, with W Uniform[l, 10] and 
ko = 0.06516, and A{u) = e^°{exp{u/3) — 1). Censoring times are standard 
exponentially distributed conditional on being in the interval [0.2,1.8]. For 
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computational simplicity we leave cq unspecified and do not use a data 
driven mechanism to select A„. Instead, we use = n~^/^. hn and An are 
estimated based on the computational strategies discussed in Section 8. We 
simulate 200 realizations for sample sizes equal to 400 and 1600. 

The summary statistics for our estimators are shown in Table 1. It can be 
seen that the sample means are quite close to the true values. The sample 
standard deviation for Pi based on sample size 400 is 0.284, compared with 
0.139 for sample size 1600, resulting in a ratio of 2.04. According to the 
asymptotic normality property (Theorem 6), the ratio should be 2. Hence, 
the ratio estimated from the simulations matches the theory quite well in this 
instance. The same property can be observed for estimators of /?2- Inference 
based on the block jackknife, as discussed in Section 7, with the modification 
given in Remark 16, is also presented in this table. The 95% confidence 
intervals generally have coverage within two Monte Carlo standard errors 
(0.03 = 2^0.05 X 0.95/200) of 0.95, except when m = 40 and the sample 
size n = 400. This is possibly because m is too large for the asymptotic 
linearity property to hold simultaneously for all m block jackknifes at this 
sample size, as hinted at in Remark 17. 

Histograms of Pi and P2 and a plot of Pi versus P2 are shown in Figure 1 . 
We can see clearly that the marginal distributions and the joint distribution 
of Pi and P2 appear to be Gaussian. Estimates and pointwise 95% confidence 
intervals for h and A based on sample size 1600 and 200 realizations are 
shown in Figure 2 and Figure 3. It can be seen that true values for h and 
H both lie in the 95% pointwise confidence intervals. 

10. Proofs. 

Table 1 

Simulation results for the partly additive Cox model with current status data. Sample 
sizes are equal to 400 and 1600. Sample means, standard deviations and confidence 
region coverages are based on 200 replicates. Confidence intervals are based on the block 
jackknife with m = 10 and 40 blocks. The true values of the regression parameters are 

Pi = 0.3 and P2 = 0.25 







Sample size 400 


Sample size 1600 


$1 


Mean (SD) 


0.297 (0.284) 


0.291 (0.139) 




Coverage 








for m= 10,40 


0.960, 0.970 


0.960, 0.970 


02 


Mean (SD) 


0.247 (0.168) 


0.246 (0.083) 




Coverage 








for m= 10,40 


0.970, 0.990 


0.970, 0.955 


Joint 


Coverage 








for m= 10,40 


0.975, 0.990 


0.960, 0.955 
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Fig. 1. Histogram of estimations of f3i and f32- Scatter plot of f3i^„ versus /32,n- The 
sample size is 1600. Based on 200 replicates. 
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Fig. 2. Estimate and pointwise confidence interval for h. The solid line is the true value. 
The dashed line is the estimated mean value. The dotted lines are the pointwise 95% 
confidence intervals. The sample size is 1600, based on 200 replicates. 




Fig. 3. Estimation and pointwise confidence intervals for A. The solid line is the true 
value. The dot-dashed line by the solid line is the estimated mean value. The dashed lines 
are the mean value plus (minus) one standard deviation. The dotted lines are pointwise 
95% confidence intervals. The sample size is 1600, based on 200 replicates. 
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Proof of Lemma 1. By the inclusions contained in examples 3 and 4, 
and by Remark 2, it suffices to check example 1, example 3 for 7 G (0, 00), 
and example 5 for 7 G (1, 00). It is straightforward to verify B5(b) for each of 
these distributions. We now verify B5(d). For F(u) = 1 — exp[— e"], f{u) = 
e"exp[-e"] and f{u) = -e"(e" - l)exp[-e"]. Thus, p{u) - f{u)F{u) = 
e"exp[— e"](exp[— e"] + e" — 1) > for all u G M, since e"" + v is strictly 
increasing on (0, cxd). Also, P{u) + f{u){\ — F{u)) = e"exp[— e"] > for all 
ti G M. Hence, B5(d) is satisfied for example 1. Similar arguments establish 
the condition for example 3. Consider now F{u) = ^'7 /"oo e~l*l^ ds for 7 G 
(l,cxo), where r-y = 7[2r(l/7)]~-^ . Since f{u) = r^e"'""'^, f{u) = 
— sign(u)7r^|n|'^~"^e~l"l^ and, thus, 

f\u) + j{u){l-F{u)) 

(10.1) 

= r^,e ' ' 1 



7- g-|«|7 

which is clearly > for all u G (— oo,0]. Since /^e~*^ ds < j~^v^^^^^^ e~'"''' 
for all V G (0, 00), (10.1) > for all u G (0, 00). Similar techniques verify that 
f'^{u) — f{u)F{u) > for all u G M and, thus, B5(d) is satisfied yet again. 

Establishing condition B5(c) is more challenging. For F(v) = 1 — exp[— e"], 
F(F~^ (n) + s) =u'^ . Let r/i = inf^g/^ e** , r/2 = sup^g^ and ^ (s) = s^^ . Note 
that ^ : [0, 1] [0, 1] is increasing and isomorphic. Furthermore, —v'^\ = 
\{^-'au)r^ - {r'av)r'\ < {V2/Vimu) - av)\, since miseKVl'e^ = 1 
and supg^j^rji ^e^ = ??2/?/i- Thus, (c) is satisfied for example 1 when a = 1. 
For example 3 with 7 G (0, 00) , F{u) = 1 - [1 +76"]-^/^ and, thus, F{F~^ (u) + 
s) = 1 — [1 — + e'*(l — u)~"']~^^"' . Hence, after some derivation, 

^F{F~Hu) +s) = -{1 + e^[(l - u)-' - l]}-i/^-i[(l - n)-^ - 1] 
OS 7 

< -{l + e^[(l-u)-T-l]}-i/T 

7 

X [1 - (1 - u)^]{(l - + [1 - (1 - u)^]e^}-i 

V e-' 
< = . 



7 

which is uniformly bounded over s £ K. Thus, B5(c) is also satisfied in this 
instance, with c = c* , a = l and ^ equal to the identity. 

It appears to be quite difficult to establish B5(c) directly for example 5 
with 7 G (1, 00), so we will use Lemma 2. In this case, we have that, for u,t > 
0, f{—u)f{—u + t) — f{—u)f{—u + r) < if and only if > sign(n — r) x 
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|m — r| — n = — r, but this last inequality is always true. Similarly, f{u)f{u — 
t) — f{u)f{u — r) > if and only if < — sign(u — t)\u — t| + u = r, but, 
again, this last inequality is always true whenever r > 0. Hence, condition (i) 

of Lemma 2 holds with c^^^ = for all r S [0, oo). Establishing condition (ii) 
is more challenging. Condition (ii) is trivially true when r = 0. Fix r G 
(0,oo) and rj G (0,1/3). Let G:[0,oo) [0,1] be strictly increasing with 
F{u) < G{u) for ah n > 0. Then u > G~^{F{u)) for ah u > and, thus, 
1 - F(F-i(l - e) - r) < 1 - F{G~^{1 - e) - r). Note that, for n > 0, 



ds 



roc 

>ci 7(l + r/)sT-ie-(i+'')"^ds 

J u 



-(1+J7)«T 



l-G(n), 



where c\ S (0,oo) does not depend on u and the inequality follows since 
inf <(>o[s~('^~^)e''*^] > 0. Solving for G{u^,) = 1 — e, we obtain = [log(ci/e)/ 
(1 + 1])]^^"' ■ Since it is also true that 1 — F(u) < C2e~^^ when u > for some 
C2 G (0, oo) which does not depend on u, 

1 - F{F-\l - e) - r) < 1 - F{G-^' 



(10.2) 



where 



< C2 exp 



:i-e)-r 
log(ci/e) 



1/7 



C2exp<^ -(1 - ?7) 



1 + 7? 

log(ci/e) 



log(ci/e) 



1/7 



(1-7?) 



1 + 7? 



log(ci/e) 

1 + 7? 



^(l-6)-r)< 



is bounded below for all e small enough. Hence, 1 — F{F 

some G (0, oo) not depending on e, for all e small enough. 
Thus, condition (ii) is satisfied, and the lemma yields that B5(c) is satisfied 
in this setting. □ 



Proof of Lemma 2. Fix a compact C M, and set r = sup{|s| :s G 
K}. Choose ei G (0, 1/3) so that condition (ii) of the lemma is satisfied for all 

e<ei and ^"^(1 - ei) A [-^-^(ei)] >r + 4^\Note that, for all u G [l-ei,l] 
and p G [0, 1 — u] , 



d 



^jF{F-\^i^\n + p})+s)-F{F-\^l^>{u})+s)]l^, 
= fiF-Hci^^{u + p}) +t)- f{F'Hci^^{u}) +t)<0, 
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for all t G K, which implies F{F-\^i^\u + p}) + t) - F{F'^{ci^\u}) + 
t) < F{F~^d^\u + p}) - r) - F(F-i(e^^{u}) - r) for ah t G iT. Argu- 
ing in a similar manner, we obtain for all u G [0,ei] and p G [0,u] that 

+ i) - - P}) +t)< F{F-^{ci"\u}) + r) - 

F{F~\^i^^{u - p}) + r) for all t G i^. 

By condition (i) of Lemma 2, we have for all e G [0, ei] that 

sup \FiF-\ci^\m}) + t) - F{F-Hd^\u2})+t)\ 

t£K,ui,U2&[0,ti] : ?i2|<e 



<F{F-\il{e})+T) 



< sup F(F-i(er{u2})+r)-F(F-i(er{wi}) + r) 



and 

sup +t) -F(F-i(e{«2}) + 01 

teK,ui,U2£[l — ei,l] : \ui—U2\<e 

< sup F{F-\(i^\u2})-T)-F{F-\d^\m})-T) 

l~€l<Ul<U2<Ul+e<l 

<l-F{F~Hd^\l-e})-T). 
Since both F and F~^ have bounded derivatives on compacts, 

sup \F{F-\C{ui})+t) - F(F-i(ei"){n2}) + < ce 

tGK,ui,U2£[ti,l—ei] : \ui~U2\<t 

for some c G (0, oo). Hence, 

sup sup \F{F-\ci^\u}) + s) - F(F-i(ei"^{7;}) + s)\ < ce'^, 

se-ffu.uelO,!] : |n-i>|<e 

where c = (1/ei) Vc[^^ Vc and a = Or- Since the compact set K was arbitrary, 

— 1 (t) 

the desired result follows with ^ chosen so that (, =£,* ■ □ 

Proof of Lemma 3. Since l{x;l3,h,H) < 0, A„ > forces /i„ G 
since otherwise lf^{Pn,hn,Hn) = — oo. Since Bq is bounded, is obviously 
bounded. Since hn is also bounded by assumption, lim//|_oo + 
^n(if(i)) + -fT) = and limH^oo F0'^z^n) + K{w{n)) + H) = 1 and, thus, 
lP0n,hn,Hn) = -OO if either Hniv(i)) = -oo or Hn{v(n)) =oo. □ 

Proof of Theorem 1. Define 

l*{x; p, h, H) = 5F{l3'z + h{w) + H{v)) 

+ {l-5){l- Fi(3'z + hiw) + 
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and fix 7 G (0, 1). Note that since lf^{(3n, hn, Hn) > lf^{f3o, /iq, Hq), Xf^J^[hn) — 
Op{l) and, thus, J(/i„) =Op(n^/^). Also, F„) > /„(/?o, /lo, -f^o) + 

Op(n~^/'^), which implies by the concavity of s i— > log(s) that 

(10.3) _ _ ^ 

These facts, combined with the result from Lemma 8 below that 

(10.4) J-Q = {[1 + J{h)]--^C{X; P,h,H):p£ Bo,H £Mu,h£ 

is P-Donsker, where is the collection of all nondecreasing functions 
mapping from R to ^ C M, imply that (P„ - P)C{X; p'^Z, hn{W),Hn{V)) = 
Op{n-^l^). Thus, PC,{X] fin, K.Hn) > Op{n-'^/^). However, by the concavity 
of s ^ log(s), PCiX;Pn,hn,Hn) < and, thus, PC{X- K, Hn) = Op(n-V6) 
Since Un{x) = l*{x;Pn,hn,Hn)/l*{x;Po,ho,HQ) satisfies < Un{x) <m<oo 
for some m not depending on n or x, Prohorov's theorem now implies for ev- 
ery subsequence n' that there exists a further subsequence n" so that [/„// {X) 
converges in distribution to some U{X) satisfying Plog{l-)-7(C/(X) — 1)} = 
and PU{X) = 1. But this implies U{X) = 1, almost surely, by the strict 
concavity of s log{l +7(5 — 1)}. Since this result is true for every subse- 
quence, we have that P\Un{X) — 1| = Op(l). This now implies that 

P{F{Hn{V)+P'nZ + hn{W)) 

(10.5) 

- F{Ho{V) + p'oZ + ho{W))}^ = opil). 

Expression (10.5) implies that 

P[{{Pn - M\Z - E[Z\V, W]) + Cn{V, W)Y\V, W] = Op{l), 

for almost surely all V and W, where Cn{V,W) = (/3„ - Po)'E[Z\V,W] + 
Hniy) — Ho(y) + hn{W) — ho{W) is a sequence uncorrelated with Z — 
E[Z\V,W]. Condition A2 now implies that Pn — = Op(l), and, further- 
more, that 

(10.6) P{F{Hn{V) + hn{W)) - F{Ho{V) + h^{W))f = Op{l). 

Let V be the set of all V such that the distribution of W given V dominates 
the unconditional distribution of W . Condition A3, combined with (10.6), 
implies that for some u S V, 

Op{l) = P[{Hn{V) + hn{W) - Ho{V) - ho{W)}^\V = v] 

(10.7) 

> PwAKiWi) - hoiWi) - Pw2hn{W2)f, 
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where is the marginal probabihty measure of W appUed to Wj, j = 1, 2. 
Since {h/[l + J{h)] -.he^u} is Donsker (see Theorem 2.4 of [43]), J(/i„) = 
Op(ni/3) and PwMWi) = 0, we have that Pw^hniWi) = Op{n~^/^). Thus, 
the last term in (10.7) implies 

PwAhn{Wi)-ho{Wi)}^ = Op{l) 

and, thus, by condition Al(b), — /lolb = Op(l). This now implies that 

\\Hn-Ho\\F,2 = Op{l). □ 

Proof of Theorem 2. Assume without loss of generality that A(i) = 1 
and A(-„) = as discussed in Remark 6. Divide the observations into contigu- 
ous disjoint segments C {1, . . . , n}, k = 1, . . . , K, where 1 = min(Mi) < 
max(Mi) = min(M2) - 1 < min(M2) < • • • < max(Mfc_i) = min(Mfc) - 1 < 
min(Mfc) < max(Mfc) = n, so that {A(j),j G M^} consists of all I's fol- 
lowed by all O's. Hence, there are at least two observations in each M^, 
k = l,...,K. Note that #„(V(i)) = Hn{V^j)) for all i,j £ M^, k = l,...,K. 
To see this, suppose that it is not true, and let j' be the index in Mj, 
which corresponds to the first time 6(^j) = over j E M^. Now the pro- 
file log-likelihood l^{(3,h,Hn) can be increased by replacing if„ with H*, 
where -fr*(V(j)) = Hn{V(^j'-)) for all i <j',i £ [since this would increase 
the value of \og{F{Hn{V^^^) + + K{W(^f^))]]. The profile log-likelihood 

will also be increased by setting //*(V(j)) = HniV^')) for all i'> £ 
[since this would lower log{F(i7„(V(j)) + (i'nZ{i) + ^n(^(i)))} and, hence, 
increase log{l - F{Hn{V(e,) +fr.Z^i)+K{W(i^))}]. Thus, lP0n,hn,H^) > 
lP{(5n,hn,Hn), and the MLE iJ„(V(j)) is therefore constant over the indices 
i£Mk, k = l,...,K. 

Define vq = HQ{ly) and po = F{vo — 2m) A [1 — F{vo)], where m is the 
maximum possible value of + /in(w^(i))| over 1 <i <n, and let q{6, t) = 

5 log(F(t)) + (1-6) log(l -F(i)). Note that condition 5(d) implies that q{5, t) 
is strictly convex over t G M for 5 G {0, 1}. Accordingly, i?„(V(i)) is piecewise 
constant for all indices in M^, k = 1, . . . , k* , for some k* < K and, thus, for 
any e £ (0,po), 

P{H^{V^,))<F~\e)} 



< Pi inf 

l<j<K 



argmax(^ ^ A(^i)log F {a + + hn{W(^i))) 



Kl = li£M, 



+ (1 - A(,)) log{l - F{a + + K{W^,)))} 



(10.8) 
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<P< 



sup 



,a<F-He),l<j<K [/=! iGM, 

+ (1- A(,))log 



EE ^0 log 



F(a + /3;Z(,) + /i„(^(,))) ■ 
F{vo -m + ()'^Z(^) + K{W(i))) . 



l-F(a + /3;Z(,)+/i„(T^(i))) 



1 _ F[vQ -m + + KiWi^i))) 



>0 



sup sup 

a<F-i(e) l<J<n 



^^(A(,)logF(a + 

.1=1 i£Mi 



mj 



logpo) 



>0 



< 



pl ■ f n < log(l/po) 
li<i<n (^'^ - log[l/F(F-i(e) + m)] 



where Q(j) = j ^ X^Li ^(i) ■ 

Let Q* = {j + l)~i [1 + Eti Q*j\^ where Q\,Ql,... are i.i.d. Bernoulh with 
probability of success F{HQ{ly) — m). Then (10.8) is bounded above by 



(10.9) 



pf. r ^* ^ log(l/Po) 
b>i^^' - log[l/F(F-i(e) +m)] 



For every r G (0, F{Hq{1^) — m)), the strong law of large numbers yields that 
Nr = sup{j : J2i=i Qj < t} is a bounded random variable. This now im- 
plies that the probability in (10.8) can be made arbitrarily small by taking e 
small enough. Hence, |i/„(V(i))| = Op(l). The proof that |i/n(V(„))| = Op{l) 
is obtained in a virtually identical manner after reversing the order of the 
indices. □ 



Proof of Theorem 3. By the isotonic regression results in Section II. 1.1 
of [19], 

Hn{Vn))= min ^'=1^^) < 1 

^^>^ l<k<n k ~ Mn 

where M„ = max{j < n : J2i=i ^(i) = !}• this setting A(i) = 1 and A(„) = 

almost surely (by assumption); but A(2) , • • • , A(„_i) conditional on V(2) i • • • > 
V{n-i) are independent Bernoullis with probabilities of success Go(V(j)), 

1 = 2, . . . ,n — 1. Thus, M„ is bounded below in probability by M* = 1 + 
max{j <n — l:X)i=i'^i = 0}) where the A^'s are i.i.d. Bernoullis with 
probability of success Go{uy) < 1. Let M* = lim„^oo-^n- Since for any 
k < oo, P{M* > A;} = {1 - Go(n„)}''~^ > 0, we now have for any > 
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that limmf„^oo-P{G„(V(i)) < r?} > P{l/M* < ??} > 0. Thus, (i) follows. The 
same argument can be used to verify (ii) after noting that 

1 & /T/ \ 1 YA=k^{i) . I]i=l{l - ^(n-j+1)} 

1 — Hn \ V(„-\ ) = 1 — max ; — — = mm ; — ^ —. r-i 

^ ^"^^ i<k<nn-k+l i<k<n k n 

Proof of Theorem 4. We make use of the following technical tools. 

Tl. Denote Q = {6 = g + H, g £ (&, H e E}, where E = {H:H is a nonde- 
creasing function and — cxd < A/i < H < M2 < 00}, for constants Mi and 
M2, and where & = {g:g = p'z + h{w) : P e Bq, \h\ < cq, J{h) < 00}. The 
arguments in Lemma 8 yield that log A^j.](e, 0/(1 + J{h)),P) < Aie~^, 
where Ai is a constant. 

T2. (Theorem in [43], page 79.) Consider a uniformly bounded class of func- 
tions G, with supggg \g — go\oo < 00 and logA^[.](e,^,P) < Ae~°' for all 
e > 0, and where a £ (0, 2). 
Then for (5„ = n-^/^^+a)^ 

\{Fn-P){9-90)\ _g.^l/2. 

Wg-goh V V^n 
where || • II2 is the L2{P) norm. 

Denote 9n{x) = fi'^z + hniw) + Hn{v), 6o{x) = (3qZ + hQ{w) + Hq{v) and 
q{5,t) = <51og(F(t)) + (1 - (^)log(l - F(i)). Then l{(3,h,H){x) = q{6,eix)). 
Denote the second-order derivative of q as —m{5,t) = (d"^ /{dt'^))q{6,t). 

Since {go,Ho) maximizes the expectation of the log-likelihood function, 
we have 



(10.10) P[ligo,Ho)-l{gr„Hn)]=P 



where t*[X) is on the line segment between 6n{X) and 6q{X). From the 
compactness of On and 6*0 , as given in the assumptions and as a consequence 
of Theorem 2, 

(10.11) 3ei,e2 :0 < ei < £2 < 00 and ei < m((5, t*) < £2 a.s. 
Combining (10.10) and (10.11), 

(10.12) - 641 < P[li9o,Ho) - l{gn,Hn)] < e2\\0n - OoWl 
The penalized MLE estimators satisfy 

(10.13) 

+ (p„ - p)[i{en) - m)] + p[i{en) - m)], 
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where 9n{x) = (3'nZ + hn{w) +Hn{v). However, if we let k and I be as defined 
in Remark 7, then 



Al{0n)-l{0n)]=n-^ 



■fc-1 



^ log{l - F(^„(x(i)) + Hr,{y[k)))} 



+ J2 logF{gn{X(^i))+Hn{yil))) 

1=1+1 

by arguments given in the proof of Theorem 2. This, combined with (10.12) 
and (10.13), yields 



(10.14) 



< XU^iho) + (P„ - P)[l{6n) - m)] + Op(n-i). 
Combined with the results in Tl and T2 (for a = 1), (10.14) gives us 
Xir{hn) + ei\\ 



,1/2 



Vn-^/^). 



(10.15) 



< Xlj\h^) + Op{n~^'^){l + J{Km\en - (70112 

Thus, we conclude 

XlJ\K) < XlJ\ho) 

+ Op{n~'/^){l + J(/i„))(||4 - OoWl^^ V n~^/6), 
as well as 

£l\\Gn — Go\\2 < A^J^(/lo) 

+ Op(n-V2)(i + JiKMlOn - eoWl^' V n-V6). 

A few further calculations give us J{hn) = Op{l) and ||^n — ^olb = Op{n~^/^). 
Following arguments similar to those used in the proof of Theorem 1, we con- 
clude ||/3„-/3o||+c?((/in,-ffn), {ho, Ho)) = Op(n"^/^). Moreover, since J{hn) = 
Op{l), we can now conclude uniform consistency of /i„ as discussed in Re- 
mark 8. □ 



(10.16) 



Proof of Lemma 4. Note that the model assumptions ensure that 
Do is bounded above and below on x [a,b]. Let Si be the the class 

of functions g: [lv,Uv] ^ ^ with E[g'^ {V)Do{V,W)] < oo, and let ^2 be the 
class of functions g:[a,b] i-^R with E[g'^{W)Do{V,W)] < oo. Because Dq 
is bounded above and below, it is not hard to show that the score spaces 
cSo* = {giV,W)Q^,iX):ge So} and Sf = {giV)Q^,{X) : g G Si} are closed 
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in L2{P), and that the L2{P) closure of the score space {g{W)Q^^{X) -.g G 
Qu,o} is S2 = {g{W)Q^fj{X) : g £ 52}. The reason we can drop the require- 
ment Eg{W) = in the latter case is that E[Q^g{X)\V,W] = 0. We also 
note that now both and 5| are closed subspaces of Sq. We can also see 
that, for any g(y,W)Q^^{X) £ Sq, {Ilig){V)Q^,f^{X) is the projection onto 
and {Il2g)iW)Q^g{X) is the projection onto 

Define a new score space 5| = {[^(l^) + h{W)]Q^g{X) ig £ Si,h £ 82}- 
Since Dq is bounded below and V and W are independent, there exists a 
constant c > such that for all g £ Si with E[g[V)] = and all h £ S2, 

E{[g{V) + hiW)fQl^{X)} > cE[g{V) + h{W)]' > cEg\V) + cEh\W). 

Thus, ^3 is also a closed subspace of 5q. This means that we can use the 
alternating projections theorem (Theorem A. 4. 2 of [6]) to establish that 
there exist a. q' £Si and an h' £ S2 so that D*{V, W) - q'{V) - h'{W) = 
lim^_oo[(l -n2)(l -ni)]™L>* and 

(10.17) E{[D*{V, W) - q'{V) - h' {W)MV) + h{W)\Ql^{X)] = 0, 
for all q£Si and h£ S2- If we can show that 



and that h* = h' , then the first expression in Dl will hold for all h £ '^ufi- 
This last assertion follows by setting q{V) = — {Ilih){V) in (10.17) and not- 
ing that q'(y)Q^o{X) is uncorrelated with [h{W) - (Ilih){V)]Q^a{X) for 
any h£ S2D 

We first establish that h' = h* . For each k>0, let q^ £ Si and /i^ £ S2 be 
defined by the equation 



To see that this makes sense, begin with go = /lo = 0, and note that, for any 



A: > 0, (1 - Ui)[D* {V, W) - qk{V) - hk{W)] = D* - {UiD*){V) + {Uih,,){V) - 
hk{W) and (1 - U2)[D* - qk+i{V) - hk{W)] = D* - qk+i{V) - {U2D*){W) + 
{U2qk+i){W). Thus, by setting qk+i(.V) = {UiD*){V) - {Uihk){V) and 

hk+i{W) = iU2D*){W) - {U2qk+i){W) 



we have a method of defining and /i^ which is consistent with (10.19). By 
solving the recursive formula in (10.20), we obtain that hk{W) = 
Ej=d(n2ni)-?] x n2(l - Ui)D* for any A: > 1. Thus, h' in (10.17) is the 
limiting value of hk, as — > 00, where the limit is in ^2 since S^ is closed. 
But this is precisely how h* is defined. Thus, h' = h* , and the limit in the 
definition of h* is well defined. 



(10.18) 




(10.19) 



D*iV, W) - qk{V) - h,iW) = [(1 - n2)(l - UitD*. 



(10.20) 



[n2(i - Ui)D*]{w) + {U2Uihk){w) 
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Now we will establish (10.18). Recall from above that G L2{V). Note 
also that the above recursive arguments imply that h* = II2D* — II2Q' ■ Thus, 
if we let Pi be the probability measure for V, 



|/i*||p,, = ||n2Z)*-n2(?'||p,. 



< ||n2Z?*||p.+ 



ly 



q'{v)S2iv,w)dPi{v) 



1/2 



< 00 

by the boundedness assumptions on D2^'^^ and Since the density of W 
is bounded below, we now have that £[{d / {dw)Y h* (w)]^ dw < 00. Thus, 
the first part of Dl is established and all that remains is to establish the 
required differentiability of q. 

Because of the asumptions about D^, all that remains is establishing that 
{Ilih*)(v) has a derivative which is uniformly bounded on [/t,,u^]. Letting 
P2{w) be the density of VF, we have that (Jiih*){y) = h{w)Ri{v, w)p2{w) dw. 
Thus, 

/ fb \i/2 
\{Uih*){v)\ < / h^{w)p2{w)dw X E[Ri{v,W)f 



and the desired result follows from the assumptions on Ri. This completes 
the proof. □ 

Proof of Theorem 5. The information calculation is based on the 
non-orthogonal projection approach discussed by Sasieni [37]. The log-likeli- 
hood function takes the form 

l{x;p,h,H) = 6log{F[/3' z + h{w) + H{v)]} 

+ {1-5) log{l - F[(3'z + h{w) + H{v)]}. 

The score function for (3 is simply the derivative of the log-likelihood with 
respect to f3, which is 

where 9^{x) = (5' z + h{w) + H{v). 

Assume hrj{w) = h{w) +rj^{w), where ^ G 9i/,o- Then {d/drj)hj^{w) = ^{w). 
Thus, the score operator for h{w) is 

6 1-6 



F{e^{x)) i-F{e{x)) 
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For the H part, assume {d / drj)Hrj[v) = a{v), where a G L2{V), and where 
L2{V) is the set of functions of the random variable V which are square- 
integrable. Then 



iH{a){x) = a{v)f{9^{x)) 



1-5 



F{e^{x)) i-F{e^{x)) 



Step 1. As suggested by Sasieni [37], we first project IpiX) onto the space 
generated by Ih{X). We will need to find a function a* G L2{y) so that 
— luia*) -L Inio) for all a G L2(F), which is equivalent to requiring 

(10.21) E[{Z-a*{V))a{V)Ql{X)\=Q 

for all a G L2(T^), where is as defined in assumption Dl. Since E[[Z — 
a*{y))a{V)Ql{X)] = E[a{V)E{{Z - a* {V))Ql{V)\V)] = 0, then if E{{Z - 

a*)Q'^\V) = almost surely, (10.21) will definitely be true. Thus, we can 
conclude that 



(10.22) a*{v) = 
Hence, we have 

(10.23) lp{X)-iH{a*){X) 



E{ZQl{X)\V 
E{Ql{X)\V- 



E{ZQl{X)\V) 
E{Ql\V) 



Qi>{V). 



Step 2. We next project lh{i) onto the space generated by In, using similar 
calculations, to obtain a 6* G L2{V) so that 



(10.24) k{X) - lH{h*){X) = Q^{X){ i{W) 



E{i{W)Ql{X)\V)^ 



E{Ql{X)\V) 



Step 3. Next, we project the space generated by — Ih{o*) onto the space 
generated by Ih — I nib*)-, which is equivalent to finding h G 9^,0 such that 



E 



Z 



E{ZQl{X)\V) 
E{Ql{X)\V) 



h{W) 



E{h{W)Ql{X)\V) 



E{Ql{X)\V) 



h{W) 



E{h{W)Ql{X)\V) 



E{Ql{X)\V) 







for all h G 9j/,o- That this is equivalent to the first conditional expectation 
in condition Dl follows from the fact that r{y)Q^,^^{X) is uncorrelated with 



h{W) 



E{h{W)QliX)\V) 



E{Q^,{X)\V) 
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for any r G L2{V) and any h G L2(W). 
This proves Theorem 5. □ 



Proof of Theorem 6. The proof of asymptotic normahty and effi- 
ciency is based on Theorem 7 below, which is a modification of van der 
Vaart's Theorem 25.54 in [47]. 

As shown in Theorem 5, the efficient score function for /3 takes the form 
[z — h{w) — q{v)]Q^g{x), where h, q and are as defined in assumption Dl. 
Formally, this function is the derivative at t = of the log-likelihood function 
evaluated at (/3o + t,ho — th, Hq — tq). However, the last coordinate of the 
latter path may not define a nondecreasing function for every t in a neigh- 
borhood of and, hence, cannot be used to obtain a stationary equation 
for the maximum likelihood estimator. To overcome this difficulty, we will 
replace the efficient score with an approximation based on an approximately 
least-favorable submodel. 

For t E M"^, define Ht{v) = H{v) -t'q[HQ^{Ho{a) V [H{v) A Ho{b)])]. Then 
for t close enough to zero, Ht defines a nondecreasing function, since v i-^ 
q{HQ^{v)) is Lipshitz continuous on [HQ{a), HQ{b)] as a consequence of con- 
dition Dl and the assumed differentiability of Hq^ . Now plug (/3 + t,hQ — 
th, Ht) into the log-likelihood function and differentiate with respect to t at 
t = 0. We then get the score function k^{X) = {Z - h{W) - q[H^^{HQ{a) V 
[H{y) f\ HQ{h)\)\)Q^{X), for which k^^ is the efficient score for [3 at ^o- 

We now have the following results. 

1. The model is differentiable in quadratic mean with respect to (3 at (/3o, h^, 
Ho). 

2. As shown in Theorem 5, the efficient information matrix is nonsingular. 

3. Note that, by the uniform consistency of we have for large enough 
n that hn is in the interior of with high probability and satisfies 
Fnhn{W) = 0. Hence, the derivative in the direction h — Fnh(W) of the 
log-likelihood will be zero for large enough n. This implies that 

P„L ~ ^ -¥r,[h]Fn[Q. ^ g ]=0. 

Since \^n{Q a u u ~ Q a u tj )\ — Op(n~^) by arguments given in the 

h'ni'l'nj-n-n Pn I'l'ni-n-Ti 

proof of Theorem 2 and since P^Q^ h u — ^p{^)j ^'^^^ have that 
"^rikf, I Tj = Op(n"^/2). Hence, also P„A; A ? ~ =Op{n~^/'^). 

4. {f3n,hn, Hn) is Consistent for {(3o,hQ, Hq) (in an L2 sense for Hn) and 
asymptotically bounded. 

5. Since J{hn) = Op(l), and by results given in the proof of Theorem 5 and 
the Lipschitz continuity of the function ip k^, we can see that there 
exists a neighborhood of {P(),hQ, Hq) such that the functions k~ ~ ~ 



32 



S. MA AND M. R. KOSOROK 



belong to a PpQ^ho,HQ Donsker class with a square integrable envelope 
function with high probability for large enough n. 
6. Denote C, = {h, H), with (q = {ho,Ho) and Cn = {hn,Hn)- Let S^^^ be the 
score operator for under the assumed model, but without the require- 
ment that Pi3^^h{W) = 0. From the orthogonality of k^^^ and Bp(^D for 
any D G 9^ x L2(l^), we can now write 

(10-25) - j %o,Co [P/3o,Cn - ^'/^o.Co - ^/3o,Co (Cn - Co)P/3o,Co] d^l 

where /i is a suitable dominating measure. To verify the "no bias" condi- 
tion (10.27) of Theorem 7 below, we use the decomposition (10.25). By 
the boundedness of the second derivative of logp^^^ in a neighborhood 
of (/SqjCo)) the first term on the right-hand side of (10.25) is bounded by 
Op(l)(i(Cn,Co)['^(Cn,Co) + Wn - /3o||]; the second term on the right-hand 
side is bounded by Op(l)(i^(Cn5 Co); and the third term is bounded by 
Op(l)d(Cn,Co)||/5n - /3o||- Thus, (10.27) follows from the fact that 
d'(Cn,Co) = Op(n-2/3)^by Theorem 4. 
7- -P/3o,ft.o,Hol|^^^^^^ 5^ - kp^^ho,Ho\? ^ in probability as a consequence of 
the previously stated Lipschitz continuity of ip t-^ and consistency re- 
sults in item 1 above. Furthermore, Pa , t, ||A;a e 7i |P = 0^(1) from 
the boundedness assumptions. Thus, condition (10.28) of Theorem 7 be- 
low is satisfied. 

Now all the conditions of Theorem 7 below are satisfied and, hence, /3„ is 
efficient for /3o- D 

Proof of Lemma 5. Fix m > d. Without loss of generality, we can 
assume by the i.i.d. structure that Xi^ = for i = 1, . . . ,Nm,n- Let eo,n = 

V^0n - Po) - ^n^^ ^. and 

ej,n = {Nm,n " m)^/^^^^ - Po) " {Nm,n " m)-^^ ^ 

where = {1, . . . , n} - {j, m + j,2m + j, . . . , {km,n - '^)in + j}, for j = 
1, . . . , m; and note that maxo<fc<m Icfe.nl = Op(l) by asymptotic linearity. Now 
let = km,nT!i=r 4>{i-i)rri+j ^r j = 1, . . . , m, and define Z* = 

Er=l Zl^. Thus, 5* = (m - 1)"! l(^,n " ^n)(^,n " ^n)"^ + Op{l). 

Hence, S* and \/n{f3n — /9o) are jointly asymptotically equivalent to 5m = 
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{m — l)~^J2^=i{Zj — Zjn){Zj — Zjn)^ and Zq, respectively, where Z^ = 
'm~^Yl^=iZj and Zq, . . . , Z^ are i.i.d. mean zero Gaussian deviates with 
variance ElcpcfP"]. Now the results follow by standard normal theory (see 
Appendix V of [39]). □ 

Lemma 8. The class J-q in expression (10.6) is P-Donsker. 

Proof. Let be the maximum possible value of \(3' Z + h{W)\ whose 
existence is guaranteed by conditions Al(a), B2 and B3. Let K in condition 
B5(c) be [— /cqj^o]) and let ci, qi and be the choices of c, a and ^ which 
satisfy the condition for this K. By condition B5(b), F is one-to-one and, 
hence. Mm = {F~\C^^G) :G e A^[o,i]}. Thus, 



f^ C{X;(3,h,F~\^Y'G)) . 



/3gBo,Gg7W[o,i],/iG 



1 + J{h) 

where ( is as defined in (10.3). Furthermore, for any Gi,G2 G -M[o,i], Pi, (32 G 
-Bo ) and any hi,h2 G^u, we have 

C(X; Pi,hi,F-\Ci'Gi)) C{X; (i2M,F-\i^^G2)) 



1 + J(/l2 



(10.26) 



1 + J(/ii) 
<m\(3[Z-0^Z\ 

C{X;Pi,hi,F-HCi'Gi))-CiX;(3,,h,,F-HCl'G2)) 



+ 



+ 



1 + J(/ii) 

C(X;/32,/ii,F-ner'G2)) CiX;P2,h2,F-H^i'G2)) 



1 + J{hi) l + J(/i2) 

where m < cxo by B5(b) and the form of C- By B5(c), the second term on 
the right-hand side of (10.26) is bounded above by ci\Gi{V) - G2{V)\°''' . 
Defining hj = hj/{l + J{hj)), j = 1,2, there exist constants < C2,C3 < oo 
so that, for the last term on the right-hand side of (10.26), 

C(X;/32,/ii,F-^(er'G2)) aX;P2,h2,F-\(Y'G2)) 



1 + J(/l2) 
1 1 



1 + J(/ii) 

^ C2\h{W) - h2{W)\ 

- {l + J{h^)) 

< C2\hi{W) - h2iW)\ + {C3 + C2h2{W)} 



l + J(/ll) l + J(/l2) 

J(/il) - J(/l2) 



(l + J(/ll))(l + J(/l2)) 



where |/i2(W^)| < cq by constraint. Let N[.-^{e,J^,Q) be the bracketing num- 
ber for the class J-' using L2{Q) brackets of size e. Note that the min- 
imum number of points S = {si,...,Si/} C [0, cxo) needed to ensure that 
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sup^g[o,oo) i^fses k ~ ^[/[(l + r)(l + s)] < e is 0(e ^). Combining this with 

the facts that log A^[.](e, A^[o,i], Q) = C4e~^, where C4 does not depend on 
Q (Theorem 2.7.5 of [48]), and that both {h/{l + J(/i)} :/i e 3=^} C 7^ = 
{h&'^u'- J{h) < 1} and log A^[.](e, Q) = 056"^, where < C5 < 00 does not 
depend on Q (see Theorem 2.4 of [43]), we have that logN[.j{e,J^o, P) < 
cee"^/"^ for some < cg < 00 and all e G (0, 1). Since this implies that the 
entropy integral with bracketing is bounded, the desired result follows. □ 

Theorem 7 (Modification of Theorem 25.54 of [47]). Suppose that the 
model {-P/3,f : f3 £ Bq} is dijjerentiable in quadratic mean with respect to (3 
at (/3o)Co) O'l^d let the efficient information matrix I/Sg^o nonsingular. 
Let (/3, C) ^ ^/3,c be an estimating function satisfying kp^^Co — ho,Co> where 
^/3o,Co ^'^ efficient influence function for (3 at (/3o;Co)- Let f3n satisfy 
V^n^ij ^ = Op{l) and be consistent for (3q. In addition, suppose there 
exists a Pp^^^Q^-Donsker class with square-integrable envelope function that 
contains every function fcs ? with probability tending to 1. Assume further 

that k satisfies 

(10.27) \^^/3„,C0%n,Cn = + 

and 

(10.28) P,o,Coll%„,c.-^/^o,Cof = «P(1), P^^J\k^^;j' = 0,{l). 
Then j3n is asymptotically efficient at (/ScCo)- 

Proof. The proof is almost identical to van der Vaart's proof of his 
Theorem 25.54 in [47]. □ 
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